“What is a statistical model?”
knitr::include_graphics("op.png")# packages
# ```{r install_rethinking}
# function for installing dependencies
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
# usage
packages <- c("coda", "plyr", "mvtnorm", "scales", "dagitty")
ipak(packages) coda plyr mvtnorm scales dagitty
TRUE TRUE TRUE TRUE TRUE
# next install rethinking
if (!require(rethinking)) {
devtools::install_github("rmcelreath/rethinking")
}
# libraries
library("tidyverse")
library("patchwork")
library("brms")
library("lubridate")
library("splines")
if (!require(equatiomatic)) {
remotes::install_github("datalorax/equatiomatic")
}
# set theme
# theme set
theme_set(theme_classic())# Import data
# read data
nz_0 <- readr::read_csv2(url("https://raw.githubusercontent.com/go-bayes/psych-447/main/data/nz/nz.csv"))
# to relevel kessler 6 variables
f<-c("None Of The Time","A Little Of The Time","Some Of The Time", "Most Of The Time", "All Of The Time")
# get data into shape
library("tidyverse")
nz <- nz_0 %>%
dplyr::mutate_if(is.character, factor) %>%
select(
-c(
SWB.Kessler01,
SWB.Kessler02,
SWB.Kessler03,
SWB.Kessler04,
SWB.Kessler05,
SWB.Kessler06
)
) %>%
dplyr::mutate(Wave = as.factor(Wave)) %>%
dplyr::mutate(FeelHopeless = forcats::fct_relevel(FeelHopeless, f)) %>%
dplyr::mutate(FeelDepressed = forcats::fct_relevel(FeelDepressed, f)) %>%
dplyr::mutate(FeelRestless = forcats::fct_relevel(FeelRestless, f)) %>%
dplyr::mutate(EverythingIsEffort = forcats::fct_relevel(EverythingIsEffort, f)) %>%
dplyr::mutate(FeelWorthless = forcats::fct_relevel(FeelWorthless, f)) %>%
dplyr::mutate(FeelNervous = forcats::fct_relevel(FeelNervous, f)) %>%
dplyr::mutate(Wave = as.factor(Wave)) %>%
dplyr::mutate(date = make_date(year = 2009, month = 6, day = 30) + TSCORE) %>%
dplyr::mutate(height_m = HLTH.Height * 100,
weight_kg = HLTH.Weight) # better height vars
In Part 1 we will introduce regression.
In Part 2 we will:
By learning regression, you will better equipped to do psychological science and to evaluate psychological research.
By learning how to simulate data you will better understand how what a regression model is doing, to evaluate your regression model, and to plan research.
Some preliminaries.
Science begins with a question about the world. The first step in science is to clarify what you want to know.
Because science is a social practice, you will also need to clarify why your question is interesting. Put simply, so what?
Next time you read a published study, ask yourself whether the authors have clarified what they want to know, and so what.
Sometimes scientists are interested in specific features of the world: how did virus x originate?
More typically, typically scientists seek generalisations. How do infectious diseases evolve?
For generalisations, we need a theory that takes us beyond individual cases.
We evaluate a model, in part, by testing its predictions.
A statistical model uses probability theory to generalise from features of a sample to features of a population. Put simply, statistics is organised guessing.
Broadly speaking, a regression model is a statistical tool for guessing wisely about some feature of interest in the world.
Regression is a sensible method for statistical guessing
Let’s start with nomenclature.
In regression, we call the feature of interest in the world a “parameter.”
We call guessing about this parameter “inference.” Inference is possible because parameters are structured by distributions. Today we will be talking about the normal distribution.
In most cases, the parameter is some quality in a population. Today we’ll be interested guessing the heights of a population of New Zealanders.
A sample is the sub-population you have measured.
set.seed(123)
sm<-rnorm(100, mean = 170, sd = 20)
ggplot2::qplot(sm, binwidth = 10)Small sample
set.seed(123)
subsm <-rnorm(10, mean = 170, sd = 20)
ggplot2::qplot(
subsm, binwidth = 10
)Large sample
set.seed(123)
largesm <-rnorm(1e5, mean = 170, sd = 20)
ggplot2::qplot(
largesm, binwidth = 1
)What is the the expected average for the population from which the sample is drawn?
N = 100
sjPlot::tab_model(
lm(sm ~ 1)
)| sm | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 171.81 | 168.19 – 175.43 | <0.001 |
| Observations | 100 | ||
| R2 / R2 adjusted | 0.000 / 0.000 | ||
N = 10
sjPlot::tab_model(
lm(subsm ~ 1)
)| subsm | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 171.49 | 157.85 – 185.14 | <0.001 |
| Observations | 10 | ||
| R2 / R2 adjusted | 0.000 / 0.000 | ||
N = 10,000
sjPlot::tab_model(
lm(largesm ~ 1)
)| largesm | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 170.02 | 169.90 – 170.14 | <0.001 |
| Observations | 100000 | ||
| R2 / R2 adjusted | 0.000 / 0.000 | ||
What do we notice about the relationship betwen sample size the estimated population average?
sjPlot::tab_model(
lm(sm ~ 1),
lm(subsm ~ 1),
lm(largesm ~ 1)
)| sm | subsm | largesm | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 171.81 | 168.19 – 175.43 | <0.001 | 171.49 | 157.85 – 185.14 | <0.001 | 170.02 | 169.90 – 170.14 | <0.001 |
| Observations | 100 | 10 | 100000 | ||||||
| R2 / R2 adjusted | 0.000 / 0.000 | 0.000 / 0.000 | 0.000 / 0.000 | ||||||
This data set is from “The heredity of height,” Karl Pearson and Alice Lee (1903)(Pearson and Lee 1903)
md_df <- data.frame(read.table(url("https://raw.githubusercontent.com/avehtari/ROS-Examples/master/PearsonLee/data/MotherDaughterHeights.txt"), header=TRUE))
dplyr::glimpse(md_df)Rows: 5,524
Columns: 2
$ daughter_height <dbl> 52.5, 52.5, 53.5, 53.5, 55.5, 55.5, 55.5, 55…
$ mother_height <dbl> 59.5, 59.5, 59.5, 59.5, 59.5, 59.5, 59.5, 59…
Does mother height predict daughter height? It seems so. By what how close are is the relationship?
Pearson and Lee collected 5,524 data points from mother’s and daughters. Let’s examine the data, first by plotting the relationship.
What what is happening here?
explore_md <-ggplot2::ggplot(data = md_df, aes(y = daughter_height, x = mother_height)) +
geom_jitter(alpha = .2) +
labs(title = "The relationship between mothers height and daughter's height") +
ylab("Daughter's height") +
xlab("Mother's height") + theme_classic()
explore_mdIs there a linear predictive relationship between these two parameters? In regression we examine the line of best fit.
m1 <- lm(daughter_height ~ mother_height, data = md_df)
sjPlot::tab_model(m1)| daughter_height | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 29.80 | 28.25 – 31.35 | <0.001 |
| mother_height | 0.54 | 0.52 – 0.57 | <0.001 |
| Observations | 5524 | ||
| R2 / R2 adjusted | 0.252 / 0.252 | ||
We can plot the coefficient, which in a model with one predictor isn’t too informative.
t_m1<-parameters::model_parameters(m1,
ci = 0.95)
plot(t_m1) +
labs(title = "The relationship between mothers height and daughter's height") +
ylab("Daughter's height") Let’s write the equation out in mathematics. How do we read this?1
library("equatiomatic")
extract_eq(m1, use_coefs = FALSE)\[ \operatorname{daughter\_height} = \alpha + \beta_{1}(\operatorname{mother\_height}) + \epsilon \]
The math says that the expected daughter’s height in a population is predicted by the average height of the population when mother’s height is set to zero units (note, this is impossible - we’ll come back to this) plus .\[\beta ~\times \] units of daughter’s height (inches) for each additional unit of mother’s height (inches) + some
We can plug the output of the model directly into the equation as follows:
library("equatiomatic")
extract_eq(m1, use_coefs = TRUE)\[ \operatorname{\widehat{daughter\_height}} = 29.8 + 0.54(\operatorname{mother\_height}) \]
library(ggeffects)
toplot<-ggeffects::ggpredict(m1, terms = "mother_height")
toplot# Predicted values of daughter_height
# x = mother_height
x | Predicted | 95% CI
----------------------------------
52.50 | 58.41 | [58.15, 58.66]
54.50 | 59.50 | [59.29, 59.70]
57.50 | 61.13 | [60.99, 61.27]
59.50 | 62.22 | [62.13, 62.32]
61.50 | 63.31 | [63.25, 63.38]
63.50 | 64.40 | [64.34, 64.47]
65.50 | 65.49 | [65.40, 65.59]
70.50 | 68.22 | [68.01, 68.42]
heightplot<-plot(toplot, add.data = TRUE, dot.alpha = .1, jitter = TRUE) + theme_classic()
heightplot + labs(title = "Predicted values of daughter's height from the Pearson/Fox 1903 dataset")Joyte Amge is the world’s shortest woman at 25 inches. Sandy Allen was the world’s tallest woman at 91 inches. What is be the expected heights of their daughter, and of every intermediary woman in between?
# use the `expand.grid` command to create a sequence of points for mother's height
ndat<-expand.grid(mother_height = c(25:91))
# use the `predict` function to create a new response
pr<- predict(m1, type = "response", interval = "confidence", newdata =ndat)
# have a look at the object
dplyr::glimpse(pr) num [1:67, 1:3] 43.4 44 44.5 45.1 45.6 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:67] "1" "2" "3" "4" ...
..$ : chr [1:3] "fit" "lwr" "upr"
# create a new dataframe for the new sequence of points for mother's height and the predicted data
newdata<-data.frame(ndat,pr)
head(newdata) mother_height fit lwr upr
1 25 43.42183 42.49099 44.35266
2 26 43.96676 43.06065 44.87288
3 27 44.51170 43.63030 45.39310
4 28 45.05664 44.19995 45.91332
5 29 45.60157 44.76960 46.43355
6 30 46.14651 45.33924 46.95378
Graph the predicted results
# graph the expected results
predplot<-ggplot(data = newdata,
aes(x= mother_height, y = fit)) +
geom_point() + geom_errorbar(aes(ymin = lwr, ymax = upr), width = .1) +
expand_limits(x = c(20,91), y = c(0,81)) + theme_classic() +
labs(title = "Predicted values for a broader population")
# plot the two graphs together (making the x and y axis at the same scale )
library("patchwork")
# rescale heightplot
# old plot with the new axis and y axis scales, and remove points
heightplot2<-plot(toplot, add.data = FALSE) + theme_classic()
nhp <- heightplot2 + expand_limits(x = c(20,91), y = c(0,81) ) + labs(title = "Predicted values of daughter's height from the Pearson/Fox 1903 dataset")
# double graph
nhp /predplot + plot_annotation(title = "What do you notie about these relationships?", tag_levels = "a")\[\begin{bmatrix} \textbf{y}\\ \textit{daughter's height}\\ 51.5\\ 52.5 \\ 53.0 \\ \vdots \end{bmatrix} = \begin{bmatrix} \textbf{intercept} \\ 1 \\ 1 \\ 1 \\ 1 \\ \vdots \\ \end{bmatrix} \begin{bmatrix} \textbf{x}\\ \textit{mother's height}\\ 59.5\\ 57.5 \\ 60.0 \\ \vdots \end{bmatrix} \begin{bmatrix} \textbf{b}\\ 29.8 \\ 0.54 \end{bmatrix}\]
# Simulate nonlinear relationshp between x and y
b <- c(2, 0.75)
set.seed(12)
x <- rnorm(100)
set.seed(12)
y <- rnorm(100, mean = b[1] * exp(b[2] * x))
dat1 <- data.frame(x, y)
ot1 <-lm(y ~ x, data = dat1)
# performance::check_model(ot1)
# Plot linear effect
plot(ggeffects::ggpredict(ot1, terms = "x"), add.data =TRUE, dot.alpha = .4)Non-linear effect
library(splines)
ot2 <-lm(y ~ x + I(x^2), data = dat1)
#performance::check_model(ot2)
plot(ggeffects::ggpredict(ot2, terms = "x"), add.data =TRUE, dot.alpha = .4)library(splines)
ot3 <-lm(y ~ bs(x), data = dat1)
#performance::check_model(ot2)
plot(ggeffects::ggpredict(ot3, terms = "x"), add.data =TRUE, dot.alpha = .4)# simulate heights
# simulate weights
set.seed(123)
weight_sim = runif(n = 100, min = 40, max = 100)
a <- 60
b <- rnorm(n=100, 1.4,.1)
mu <- a + weight_sim * b + rnorm(100,0,10)
height_sim = rnorm(100,mu,10)
df <-data.frame(height_sim, weight_sim)Explore data
ggplot(data = df,
aes(y = height_sim, x = weight_sim)
) +
geom_point() +
geom_smooth(method = lm)People use the work “effect” but that is not what regression gives us (by default)
rnormrnorm is a r’s random number generator. Within this function:
nspecifies the number of observationssd specifies the value of the standard deviationmean specifies the value of the meanset.seed(12345)
# generate random numbers
ds <- rnorm(n = 1000, mean = 0, sd = 1)
dplyr::glimpse(ds) num [1:1000] 0.586 0.709 -0.109 -0.453 0.606 ...
We can create a histogram:
p1 <- ggplot2::qplot(ds) + labs(title = "1st random number list")
p1We use shorthand for generating numbers:
set.seed(54321)
ds_0 <- rnorm(1000)
dplyr::glimpse(ds_0) num [1:1000] -0.179 -0.928 -0.784 -1.651 -0.408 ...
Note that the first and the second graphs differ:
p2 <- ggplot2::qplot(ds_0) + labs(title = "2d random number list")
p1 + p2 + plot_annotation("The two graphs differ", tag_levels = 'i')Or more formally we can ask R to test the equivalence:
identical(ds, ds_0)[1] FALSE
Because we want to have reproducible code, we will use the set.seed() function in R to ensure the same random numbers are generated each time.
set.seed(123)
t1 <-stats::rnorm(100)
set.seed(123)
t2 <-stats::rnorm(100)
# test
identical(t1, t2)[1] TRUE
runifWe use r uniform to generate continuous data within a point range
set.seed(123)
ds1 <- runif(n =100, min = 0, max = 50)
dplyr::glimpse(ds1) num [1:100] 14.4 39.4 20.4 44.2 47 ...
hist(ds1)Say we want to simulate a range of values between two endpoints. This is useful for simulating explanatory variables.
{r. code_folding=FALSE} set.seed(123) exp <- runif(n =100, min = 130, max = 220) dplyr::glimpse(exp) hist(exp)
repFrequently we’ll need to generate random factors. For this, R’s rep function, letters function, and LETTERS function make happy friends.
Here’s how these functions work
Lower case letters:
letters[1:3][1] "a" "b" "c"
Upper case letters:
LETTERS[4:10][1] "D" "E" "F" "G" "H" "I" "J"
Creating sequences using each
rep( letters[1:3], each = 3 )[1] "a" "a" "a" "b" "b" "b" "c" "c" "c"
Creating sequences using times
rep( letters[1:3], times = 3 )[1] "a" "b" "c" "a" "b" "c" "a" "b" "c"
Create a sequences using length.out
rep( letters[1:3], length.out = 5 )[1] "a" "b" "c" "a" "b"
Creating uneven sequences
rep( letters[1:3], times = c(3, 1, 4) )[1] "a" "a" "a" "b" "c" "c" "c" "c"
combining each + times
rep(letters[1:3], each = 2, times = 3) [1] "a" "a" "b" "b" "c" "c" "a" "a" "b" "b" "c" "c" "a" "a" "b" "b"
[17] "c" "c"
length.out
rep(letters[1:3], each = 2, length.out = 17) [1] "a" "a" "b" "b" "c" "c" "a" "a" "b" "b" "c" "c" "a" "a" "b" "b"
[17] "c"
Note length.out take priority over times – use length.out if you have a fixed vector length.
seqRecall the relationship between mother’s heights and daughters heights in the Pearson/Fox dataset:
# recall this model
explore_md <-ggplot2::ggplot(data = md_df, aes(y = daughter_height, x = mother_height)) +
geom_jitter(alpha = .2) +
labs(title = "The relationship between mothers height and daughter's height") +
ylab("Daughter's height") +
xlab("Mother's height") + theme_classic()
explore_mdWhat would a random relationship look like?
av_dh <-mean(md_df$daughter_height, na.rm=TRUE)
sd_dh <-sd(md_df$daughter_height, na.rm=TRUE)
av_mh <-mean(md_df$mother_height, na.rm=TRUE)
sd_mh <-sd(md_df$mother_height, na.rm=TRUE)
# number of obs
N<- nrow(md_df)
# fake data
sim_dh = rnorm(N, av_dh, sd_dh)
sim_mh = rnorm(N, av_mh, sd_mh)
sim_df_md <- data.frame(sim_dh,sim_mh)
fake_md <-ggplot2::ggplot(data = sim_df_md, aes(y = sim_dh, x = sim_mh)) +
geom_jitter(alpha = .2) +
labs(title = "Fake data relationship between mothers height and daughter's height") +
ylab("Daughter's height") +
xlab("Mother's height") + theme_classic() +
geom_smooth(method = lm)
# real data
explore_md <-ggplot2::ggplot(data = md_df, aes(y = daughter_height, x = mother_height)) +
geom_jitter(alpha = .2) +
labs(title = "The relationship between mothers height and daughter's height") +
ylab("Daughter's height") +
xlab("Mother's height") + theme_classic() +
geom_smooth(method = lm)
library(patchwork)
fake_md + explore_md + plot_annotation(tag_levels = "a")We can build a model using fake data
set.seed(123)
height_sim = rnorm(n = 100, mean = 170, sd = 40)
weight_sim = runif(n = 100, min = 40, max = 100)m0 <- lm(height_sim ~ weight_sim)
summary(m0)
Call:
lm(formula = height_sim ~ weight_sim)
Residuals:
Min 1Q Median 3Q Max
-92.671 -25.541 -0.767 24.036 82.127
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 165.6587 14.8807 11.132 <2e-16 ***
weight_sim 0.1151 0.2085 0.552 0.582
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 36.64 on 98 degrees of freedom
Multiple R-squared: 0.003097, Adjusted R-squared: -0.007076
F-statistic: 0.3044 on 1 and 98 DF, p-value: 0.5824
plot(ggeffects::ggpredict(m0, terms = c("weight_sim")), add.data = TRUE, dot.alpha = .1)We can use vectors within random number generation
set.seed(123)
vdf<-rnorm(n = 20, mean = c(0, 500, 1000), sd = c(5,50,100))
qplot(vdf, binwidth=4)Can appear to reveal relationships that are not there.
set.seed(123)
# no relationship between x and y
x = rnorm(n = 10, mean = 0, sd = 1)
y = rnorm(n = 10, mean = 0, sd = 1)
df<-data.frame(x,y)
ggplot2::ggplot(df,aes(y,x)) + geom_point() + geom_smooth(method=lm)Simulate the relationship between two variables
### Simulate a relationship between two variables
library(ggplot2)
N = 1000
weight <-runif(N, min = 50, max =100)
b <- rlnorm(N,.78,.1)
b [1] 2.403021 2.157979 2.034313 2.122099 2.438699 2.304824 2.468640
[8] 2.212028 2.272834 2.062986 2.317611 2.073767 1.892585 2.209573
[15] 2.650068 2.363377 2.451072 2.261177 2.052676 2.137797 2.122671
[22] 2.081586 2.340622 1.935303 2.378896 2.378370 1.935059 2.325533
[29] 2.781590 2.063242 2.373796 2.017340 2.437739 2.236657 2.573306
[36] 1.885330 2.170310 2.069501 2.138861 2.048365 2.006949 2.311444
[43] 1.956666 2.530465 1.937463 2.203634 2.300897 2.313296 2.116630
[50] 2.198885 2.401579 1.885802 2.017433 2.252499 2.086570 2.501775
[57] 2.333398 2.197272 1.876155 2.187173 2.113528 2.159260 1.938364
[64] 2.293011 1.966203 2.132677 2.266286 2.017075 2.312431 1.912381
[71] 1.647110 2.285299 2.372760 2.119999 2.294265 1.943340 2.153911
[78] 1.796514 2.454979 2.627390 2.428810 2.175515 2.174213 1.874596
[85] 2.360890 2.135982 2.042809 1.894202 2.117050 2.003897 2.096558
[92] 1.931390 2.582503 2.177984 2.429036 1.681742 2.084815 2.038984
[99] 1.930362 2.546350 1.893585 2.252046 2.374160 2.220692 1.998655
[106] 2.396757 2.219005 1.961384 1.898749 2.687662 2.038368 1.812021
[113] 2.300959 2.250209 1.905256 1.796256 2.156248 2.444742 2.324750
[120] 2.076547 2.006880 2.241413 2.216070 2.323260 2.096816 2.386757
[127] 2.007558 2.110544 2.349216 2.408483 1.797056 2.204981 2.318401
[134] 1.886867 2.288880 2.008087 2.415787 2.302161 2.355859 2.207966
[141] 2.378250 2.504406 2.655478 2.175287 1.742106 2.188360 2.226779
[148] 2.147846 2.309033 2.413475 2.071352 2.118250 2.270010 2.064685
[155] 2.201473 1.792891 1.950352 1.910232 2.002983 2.035354 2.266486
[162] 2.406591 2.028429 1.974502 1.965666 2.092880 2.129947 2.289565
[169] 2.112490 1.772075 2.161617 2.456454 2.457538 2.015976 1.868661
[176] 2.789343 2.146327 2.160317 2.275176 1.856320 2.028260 1.870032
[183] 2.035396 2.207554 1.903185 2.314048 2.245514 1.992875 2.231407
[190] 2.350923 2.425674 2.135531 2.161141 2.162638 2.519716 2.441243
[197] 2.371304 2.119682 2.264432 2.271247 1.965669 1.835230 2.326077
[204] 1.872115 2.181840 2.236752 2.308012 2.223189 2.027320 2.407615
[211] 2.595719 2.382423 1.796131 2.509184 2.169278 2.299040 2.321476
[218] 2.160482 2.165115 2.415523 2.342363 2.408553 2.768464 2.331336
[225] 2.227184 1.748812 2.855284 2.078678 2.766197 2.264750 2.544268
[232] 2.157670 2.295951 2.228650 2.141246 2.155366 2.413996 2.137964
[239] 1.779321 2.139155 2.302462 2.320182 2.320208 1.841886 2.263414
[246] 2.403163 2.478511 2.132945 2.112370 2.531429 1.846343 2.088230
[253] 2.283584 1.855627 2.243333 2.632112 2.180587 2.121566 2.287572
[260] 2.121435 2.366330 2.387970 2.182060 1.939307 1.912054 2.055872
[267] 2.362542 1.793519 1.806457 2.043414 2.269228 1.991013 2.383750
[274] 2.255422 2.144564 2.367614 2.267860 2.086330 2.232476 2.327399
[281] 2.260596 2.042550 2.376242 2.448054 2.242581 2.213136 2.165037
[288] 2.707814 2.242590 2.147213 1.697588 1.865393 2.164594 2.226942
[295] 2.242715 2.368249 2.139527 2.463194 1.989430 1.933160 1.929192
[302] 2.349564 2.163458 2.360756 2.123848 2.056100 2.102579 1.812557
[309] 1.940680 1.888526 2.424032 2.054982 2.360671 2.538692 2.140036
[316] 2.244287 1.831057 2.009997 2.193770 2.247703 2.021946 2.853328
[323] 2.083733 2.195532 2.327929 2.175804 2.045502 2.421847 2.563964
[330] 2.175004 2.307643 2.160325 2.414870 1.943292 2.751334 2.053708
[337] 1.885353 2.106248 2.213712 2.566036 2.389588 2.212772 1.898476
[344] 2.000498 2.146141 2.815958 1.811178 2.442704 2.069437 2.576931
[351] 1.946592 2.213029 1.954325 2.387750 2.530401 2.651358 2.362594
[358] 2.623021 2.471048 2.152893 2.288059 1.979413 2.141443 2.464765
[365] 2.302806 2.283560 1.966365 2.053506 2.020893 2.269432 1.975752
[372] 2.307591 1.951032 2.619158 2.284299 2.033787 2.234695 2.105925
[379] 2.263958 2.235250 1.971094 2.015503 2.247817 2.569998 2.431387
[386] 2.049392 2.369295 2.170903 2.248203 2.239015 2.822280 1.937641
[393] 2.203599 1.825771 2.314014 2.434304 2.520775 1.799458 2.273401
[400] 2.558285 2.093000 2.135680 2.173516 2.262571 2.331510 2.488754
[407] 2.160741 2.224713 2.797706 2.277572 2.223039 1.907466 2.182095
[414] 2.133721 2.179064 2.059490 2.036675 2.029770 2.135177 2.501307
[421] 2.422763 2.104341 1.843025 2.004795 2.083864 2.204198 2.041611
[428] 2.666237 2.122879 1.932096 2.150873 1.972817 2.215805 2.233039
[435] 2.260438 1.854869 2.230155 2.250258 1.892482 2.400163 2.359423
[442] 2.745497 2.215926 2.191691 2.202644 2.196745 1.813308 1.845753
[449] 2.164623 2.058326 2.193446 1.766286 1.877855 1.953947 2.407541
[456] 1.954532 2.013851 2.198966 2.112190 2.213648 2.746991 1.949435
[463] 2.115842 2.071606 2.537653 2.019907 2.163639 2.360122 1.962346
[470] 2.574145 2.333983 1.959284 2.282926 2.135433 2.250884 2.161932
[477] 2.427960 1.905777 2.070393 2.127784 1.974707 1.966007 2.177553
[484] 2.152828 1.690570 2.420701 2.236635 2.777693 2.336186 2.086116
[491] 2.885618 2.895688 1.931175 2.286228 2.135873 2.222661 2.231679
[498] 1.922853 2.244671 2.598475 2.145969 2.146218 2.508932 2.386528
[505] 1.849935 2.231905 2.573726 2.513127 2.275034 2.344617 1.935385
[512] 2.247938 1.982889 2.083811 2.395424 1.947041 2.240484 2.276942
[519] 2.193484 2.617498 1.969472 2.317787 2.162158 2.125308 2.285098
[526] 1.969855 1.912966 2.076226 2.599127 2.193671 2.254985 2.140448
[533] 2.286562 1.983439 2.449273 2.312827 2.012454 2.193405 2.343471
[540] 2.306597 2.529934 2.051767 2.439062 2.419727 2.146313 1.978635
[547] 1.956360 2.283658 2.166011 2.606244 2.301391 2.101824 1.968842
[554] 2.058052 2.257570 2.085287 2.296584 2.109743 2.158566 2.027795
[561] 2.639275 2.255253 2.232369 1.841930 2.330258 1.969220 1.995407
[568] 2.391293 2.084919 1.831550 2.603848 1.719888 2.310078 2.415062
[575] 2.048081 2.280578 2.279402 2.420713 2.289675 2.128700 2.390731
[582] 2.363308 1.986438 1.896332 2.216718 2.122519 1.976734 2.199859
[589] 1.911714 2.216928 2.049318 2.400595 2.779994 1.990532 2.424842
[596] 2.369111 2.166213 2.084722 2.553668 1.785064 2.045578 1.889506
[603] 2.508115 2.140265 2.069967 2.999377 2.170584 2.086786 2.247878
[610] 1.864807 2.291096 2.160595 2.286112 1.977361 1.969346 2.035331
[617] 2.020209 2.484087 2.554648 2.147514 2.104560 2.110861 2.196628
[624] 2.202714 2.245669 2.024519 2.004331 2.458884 2.065015 2.248593
[631] 2.169080 1.982215 2.314298 2.219563 2.509236 2.207247 2.110332
[638] 2.243034 1.937582 2.006537 2.295676 2.110000 2.167130 2.156481
[645] 2.044082 1.782704 2.258912 2.354113 1.917704 2.530053 2.267132
[652] 2.494689 1.982350 2.218160 2.159763 2.355730 2.059399 2.179271
[659] 1.826012 2.018264 2.208919 2.032706 2.171988 2.081747 2.317973
[666] 2.451865 2.009227 2.115511 2.519288 1.750861 2.112805 2.681752
[673] 2.716540 2.215902 2.000984 2.217867 2.043620 2.522579 2.012448
[680] 2.264358 2.320970 2.022327 2.375368 2.024266 2.323382 2.434317
[687] 1.976160 2.437077 2.077254 2.246639 2.225950 2.090243 2.240749
[694] 1.928914 2.151976 2.369263 1.755208 1.879879 1.942171 1.860957
[701] 2.274951 1.975274 1.756890 2.046729 2.097899 2.376618 1.953504
[708] 2.450100 2.270128 2.261968 2.003203 2.652140 2.145930 1.817588
[715] 2.137452 1.797785 2.114776 2.091282 2.335400 2.413190 2.028688
[722] 2.364605 2.515401 2.016948 2.043695 2.328159 2.221771 2.304532
[729] 2.510466 2.267569 2.423397 2.321679 2.278146 2.267343 2.482168
[736] 1.973432 1.953225 2.314502 2.155518 2.197676 2.349324 2.352149
[743] 2.124917 2.114346 2.197587 2.426141 2.276415 2.517587 2.179807
[750] 2.441388 2.382857 2.319168 2.273840 2.121263 2.157815 2.232093
[757] 2.192163 2.397250 2.157755 2.166174 2.078337 2.151502 2.166523
[764] 1.730874 1.903162 2.165717 2.124362 1.934624 1.787550 2.105521
[771] 2.328792 2.604717 2.173099 2.532782 2.199659 2.206821 2.253495
[778] 1.999590 2.170219 2.388928 2.023902 2.112431 1.944924 2.260159
[785] 2.276138 2.327593 1.930963 2.204991 1.984951 2.181388 2.494934
[792] 2.074349 2.343553 2.023914 2.079541 2.279309 2.038244 1.839886
[799] 2.475684 2.261517 2.057797 1.787096 2.638527 3.061904 2.227206
[806] 2.374960 2.465651 2.033624 2.106189 1.835447 2.026565 2.321564
[813] 2.118966 2.135240 2.157055 1.810555 2.372070 1.888250 2.136444
[820] 2.087868 2.134304 2.524383 2.058122 2.017159 1.873924 2.012606
[827] 1.941346 2.272305 2.001105 2.248817 2.149762 1.890127 2.015645
[834] 2.383360 2.387645 2.665942 2.180691 1.878394 2.020123 2.272428
[841] 2.637980 2.205603 2.444984 2.355630 1.940976 2.144462 2.485630
[848] 2.381212 2.285031 2.288076 2.076865 1.911831 2.483186 1.892650
[855] 1.985976 2.323086 1.922796 2.064345 1.938123 2.321158 2.281040
[862] 2.275474 2.280162 2.306485 2.325501 1.791644 2.149249 2.206144
[869] 2.345427 1.937215 2.075164 1.959395 2.424739 2.479129 2.360250
[876] 1.930463 2.282326 2.447446 2.218417 2.061408 1.956952 2.041061
[883] 2.343123 2.089310 2.231695 2.483067 2.311354 2.500441 1.840141
[890] 2.121095 2.195713 2.311413 1.940759 2.364623 2.249570 2.239787
[897] 2.295265 2.156236 2.393016 2.327566 2.148949 2.420654 2.246236
[904] 2.332347 2.055630 2.554977 2.180602 2.374494 2.159741 2.121317
[911] 2.359486 1.861790 2.288363 2.269040 1.666069 2.263331 1.756210
[918] 2.330261 2.084666 2.035021 2.179979 2.502537 2.047189 2.306682
[925] 2.257179 1.938759 1.832897 1.787363 2.305108 2.173907 2.624938
[932] 2.310277 2.374934 2.492879 2.074931 2.295636 2.379474 2.501612
[939] 2.354351 2.275306 2.000061 2.346573 2.293380 2.324314 2.275875
[946] 2.137883 2.164768 2.336692 2.219236 2.007699 2.119084 1.911881
[953] 1.980396 2.150153 1.822459 1.842541 2.435754 2.059242 1.812724
[960] 2.156990 2.489562 2.330832 2.279915 2.455608 2.019500 2.346603
[967] 2.057088 2.181639 2.722217 2.403542 2.355612 1.952611 2.016526
[974] 2.741243 1.955547 2.228766 2.385138 2.415426 2.432480 2.146175
[981] 2.009531 2.115464 1.993297 2.322646 2.440091 2.698568 2.262819
[988] 1.998750 2.416808 2.388047 2.130113 1.866779 2.354035 2.442237
[995] 2.118036 2.301645 2.122110 2.335513 2.156035 2.107563
sigma <- runif(N, 0 , 10 )
height = weight * b
height [1] 137.30863 152.62804 143.79785 145.24125 140.52331 131.23739
[7] 152.19589 162.13751 143.86728 191.63360 121.19151 149.53933
[13] 170.23088 123.94592 206.83088 142.57441 138.18304 198.22699
[19] 194.49570 146.91613 176.72458 113.95024 161.96749 123.31589
[25] 215.84202 172.25537 175.12906 210.73857 249.55622 148.53605
[31] 208.23832 164.33465 208.44895 111.90273 189.82207 115.01634
[37] 149.73151 166.88155 144.56540 113.80057 124.79404 192.78082
[43] 138.69305 226.24835 106.83797 158.09894 228.35905 218.95942
[49] 199.64788 129.19025 135.77275 155.87118 135.52274 186.59229
[55] 137.75259 148.56679 207.94006 120.14629 137.59521 165.29623
[61] 169.08104 143.89560 144.27366 224.08150 145.78437 201.57528
[67] 216.93327 162.24695 163.10610 109.68409 159.38263 148.68484
[73] 125.84179 206.45893 197.37521 110.99333 166.85104 175.52762
[79] 194.61639 184.50984 200.12097 143.56446 142.16312 114.32858
[85] 161.66061 211.91284 117.89073 103.33286 120.87372 169.33003
[91] 169.74326 182.65099 216.02623 189.16639 184.74468 139.57098
[97] 189.90639 182.11000 191.08862 183.26482 124.19097 158.71010
[103] 119.95052 131.44828 184.14901 147.53978 137.47844 105.59023
[109] 118.26581 232.76972 188.28951 135.67755 159.67608 140.23853
[115] 105.84618 124.83927 169.47413 148.74945 167.93620 126.46072
[121] 150.74673 151.73300 182.82413 159.69087 142.10601 183.02700
[127] 174.69107 128.85955 165.94226 152.41925 146.45761 130.51599
[133] 216.03376 164.77704 190.92516 162.45604 165.75177 176.09639
[139] 220.82432 174.62256 218.77137 164.34513 226.81640 137.58882
[145] 138.87577 162.07980 140.84741 168.02496 220.88075 229.50636
[151] 131.96241 139.96157 225.37123 167.23879 213.24725 131.46664
[157] 137.19096 158.47570 115.40654 160.06703 140.37775 236.12979
[163] 162.41284 149.57144 137.84954 196.75652 145.27217 147.47537
[169] 123.64884 103.85880 160.18043 153.89251 149.44963 168.77517
[175] 97.88643 237.21312 145.07972 152.18827 207.14924 178.10067
[181] 130.06502 183.36648 175.89838 186.13817 100.18787 161.43030
[187] 165.92612 155.46951 189.47567 225.18125 196.27964 152.52190
[193] 166.63265 114.45527 158.85006 170.53935 142.01086 194.15516
[199] 130.53176 204.80045 152.02742 152.53672 136.27302 152.86339
[205] 143.11447 192.87001 161.43853 218.91257 199.42728 207.86174
[211] 163.16904 145.54076 143.06593 159.02223 166.06587 205.22280
[217] 135.58127 151.70896 159.30663 225.62272 225.53534 226.64215
[223] 231.74630 227.32475 168.87011 137.85177 190.78027 140.03273
[229] 141.07941 170.17479 238.02180 108.56325 123.06952 129.73094
[235] 189.53604 186.99788 238.00499 156.76329 95.58374 176.35391
[241] 202.45471 131.91465 162.01833 112.81414 119.72990 167.72791
[247] 131.97181 130.73745 111.38837 211.40998 119.80386 114.92801
[253] 122.38916 174.46981 196.76807 239.07547 216.11144 117.06799
[259] 125.70689 190.80522 211.14466 120.52441 194.10138 167.69099
[265] 155.84499 152.22818 136.63014 90.41267 131.19018 152.46867
[271] 157.66447 145.80842 204.20145 119.00755 145.27092 213.41820
[277] 208.15650 129.11768 151.13711 216.08566 209.53047 132.34656
[283] 136.28322 208.57327 123.76880 114.38900 216.43925 140.11241
[289] 150.07316 205.60246 137.26993 119.97139 188.08127 204.21664
[295] 147.37675 176.73843 181.57885 202.16201 163.52352 191.17531
[301] 136.46482 131.50565 165.07508 144.60495 157.84560 140.86490
[307] 208.50753 125.82035 119.27855 153.28197 137.75048 202.15570
[313] 178.82932 147.63388 173.54644 222.85256 152.78077 142.60081
[319] 145.15572 206.25542 115.63679 170.17473 197.61485 143.60091
[325] 158.68337 194.07587 122.05293 123.24367 180.32460 161.29480
[331] 164.05554 145.04518 225.36574 141.38497 210.99500 201.65803
[337] 167.28657 127.30968 144.86387 252.92674 189.36291 194.81502
[343] 130.30282 176.96346 165.00363 269.48656 107.33912 156.60398
[349] 113.29778 155.96707 192.43021 143.43774 168.65660 213.18884
[355] 139.85753 164.33048 150.08907 144.40497 138.12094 214.34596
[361] 227.21044 112.53633 204.00557 194.26066 160.67241 165.53557
[367] 167.78032 111.14631 135.33036 190.72165 130.09828 211.32556
[373] 118.54197 196.16828 145.74394 121.21608 217.95214 139.17273
[379] 167.35819 114.89104 152.50938 165.69853 179.40539 169.86883
[385] 229.90150 166.64191 154.34828 150.68288 130.44920 208.51411
[391] 275.60990 151.48908 146.48948 182.26827 142.86949 196.28668
[397] 139.67341 133.79263 124.97406 148.52963 134.26520 169.13220
[403] 188.19474 131.85370 217.58438 212.61102 190.18838 127.59663
[409] 189.97220 190.55675 169.37588 128.73505 135.34782 112.89430
[415] 134.73368 194.62980 184.50515 177.35284 123.29705 140.66624
[421] 239.21464 151.10538 134.91098 116.80917 165.13963 140.05233
[427] 125.56898 225.45803 136.16703 174.89317 117.64379 179.72655
[433] 158.14510 196.04822 187.88600 133.97047 181.43942 112.56527
[439] 115.18057 204.59855 143.35459 249.00743 144.89539 184.95052
[445] 212.85045 122.55416 102.24388 154.87933 154.65673 188.78980
[451] 216.21211 94.53950 137.06325 166.24039 130.84264 194.76369
[457] 126.17771 115.39498 178.09194 197.78139 185.91720 133.18945
[463] 136.16505 111.86391 173.25244 118.97392 166.17325 177.47513
[469] 190.84164 172.63780 170.93085 106.04933 212.32428 149.01697
[475] 195.36537 126.66151 176.60519 168.68080 110.00514 193.10504
[481] 128.46877 134.14763 142.85979 111.66397 128.38230 203.21951
[487] 212.84195 142.42992 232.34287 135.89875 279.78064 244.33922
[493] 139.72411 207.64414 111.01190 193.26389 150.49546 175.86901
[499] 172.33926 165.59371 193.23891 117.14539 229.83170 152.36252
[505] 162.15699 219.19014 139.16994 233.01267 205.00769 162.38508
[511] 128.47063 135.43135 155.59562 196.71746 183.21571 154.49366
[517] 186.60263 174.17380 165.59082 132.98990 103.17175 223.59124
[523] 191.27377 127.63332 188.55050 162.88403 133.45365 188.13774
[529] 201.04554 206.76806 175.13501 203.98988 181.49142 141.16764
[535] 238.75329 197.63544 142.21058 111.68427 183.57979 171.84876
[541] 237.64611 185.97754 226.11327 165.49675 201.10419 113.90391
[547] 125.38424 190.30926 214.15175 206.25033 175.66405 111.47887
[553] 193.83636 115.27530 122.85280 196.09666 173.20503 141.08854
[559] 204.45229 104.63135 163.26949 190.17312 136.82395 121.42872
[565] 136.78429 177.37058 114.36494 217.93254 138.75112 125.84302
[571] 212.18048 94.30437 118.04424 240.66633 162.20175 203.17928
[577] 215.61957 212.40132 226.58668 111.13381 227.52598 220.43763
[583] 176.33686 130.54502 115.50299 144.79945 125.89338 203.53834
[589] 130.22614 144.59669 180.25140 221.43500 202.65069 172.14438
[595] 133.86082 144.51381 211.63375 182.62553 232.25041 126.54114
[601] 163.01607 169.94444 236.81727 148.23385 112.84656 243.81087
[607] 189.45251 113.24763 146.17023 150.54234 200.80677 207.09921
[613] 168.68654 154.93802 171.00034 189.02603 192.84376 131.20724
[619] 191.97011 145.01568 194.20417 190.65671 122.71808 188.62797
[625] 138.70024 108.81351 193.98378 142.26710 170.05987 131.93817
[631] 110.63109 150.77858 125.69851 142.38525 178.21032 175.17573
[637] 190.63628 134.80582 141.38775 145.28767 199.00615 180.90663
[643] 198.42484 203.41722 199.56827 138.21451 171.94116 137.79779
[649] 141.84021 158.61166 117.86598 203.91593 152.59742 126.44662
[655] 138.62568 186.45965 119.96264 119.45684 130.43320 136.81425
[661] 203.76546 128.07393 111.11290 193.85222 154.67693 200.04628
[667] 155.35604 145.59417 149.37775 125.09382 172.27529 203.92612
[673] 225.42139 191.60472 148.75545 153.52700 102.87924 126.59363
[679] 200.73526 125.43261 164.63440 173.70624 206.95860 189.47115
[685] 186.78459 213.75777 182.52224 196.52344 186.24908 114.89926
[691] 157.69904 195.94800 184.56458 185.60768 126.12999 155.87535
[697] 158.34228 186.95052 128.41492 93.15874 226.60448 113.41554
[703] 92.18089 163.46752 138.60070 181.56519 175.36822 130.94960
[709] 192.39571 220.38710 127.98157 165.68005 124.40336 97.38909
[715] 166.44739 133.92037 158.00470 197.45588 121.05594 145.24314
[721] 153.54952 146.17276 198.14264 149.48187 160.36393 133.18843
[727] 127.27515 162.73256 211.25701 187.46308 232.15974 210.47332
[733] 190.89612 204.91098 140.00226 123.41630 130.04053 162.85639
[739] 176.25321 198.73718 147.86661 213.98292 107.89031 174.89879
[745] 199.07055 177.27433 136.97387 128.24186 139.84908 233.79619
[751] 230.38612 178.86947 138.63859 157.97342 194.32731 157.09810
[757] 120.97580 153.49177 146.88552 136.38715 152.82863 146.93099
[763] 121.46307 90.61070 120.16531 213.17594 158.10518 142.95151
[769] 156.30277 175.56260 122.19352 220.76344 148.12513 238.60503
[775] 195.25240 125.70095 145.91487 112.58521 172.52175 186.53654
[781] 170.89032 138.49845 156.13722 225.00272 198.38871 125.20788
[787] 140.10769 116.15198 132.94760 189.12115 125.25904 183.77795
[793] 171.42803 174.14162 173.27835 179.16206 173.63935 152.45488
[799] 159.60144 124.07127 201.91325 155.14981 212.76076 171.45576
[805] 172.63734 149.94979 234.03480 102.61464 130.18725 103.70683
[811] 134.38488 200.39725 211.02206 183.11105 162.25800 130.00215
[817] 231.13750 105.75916 114.84940 197.20142 152.01069 131.56587
[823] 169.53186 148.12194 151.59353 127.88251 112.33553 126.59626
[829] 150.85159 174.06205 122.60591 110.55058 177.57684 182.01660
[835] 222.16912 223.08000 110.45649 159.02476 191.07470 185.41310
[841] 146.05108 211.85535 204.81173 135.28308 169.38331 208.29148
[847] 176.57709 154.50769 143.89145 139.90229 162.58262 167.92092
[853] 207.29684 146.35133 179.87615 204.33479 98.06919 142.53308
[859] 101.83675 208.66087 219.40170 175.50714 211.18457 182.62317
[865] 193.98444 135.38691 189.42948 209.95460 213.48953 103.77969
[871] 104.16246 103.08363 226.29609 195.38575 155.04987 189.13377
[877] 181.58116 187.40206 153.50747 136.00686 176.94626 106.33106
[883] 159.77087 193.95033 189.46496 209.13466 155.78700 194.36954
[889] 104.63441 189.30045 207.15043 139.15813 171.81766 188.73981
[895] 220.19574 129.76074 175.12580 201.94783 223.71185 119.13513
[901] 212.30435 180.36585 156.02021 165.31152 112.33254 148.41978
[907] 153.23272 159.30650 152.82948 138.31542 184.07301 107.59974
[913] 223.86821 118.44008 114.30549 222.10243 144.48583 123.65096
[919] 146.96331 145.08737 164.38767 181.38390 166.15615 131.47836
[925] 215.32882 152.13862 141.89194 99.80883 203.08326 160.69185
[931] 233.87787 120.83056 216.10054 158.22410 133.09133 188.61190
[937] 231.77488 125.95595 159.10912 161.43819 145.79211 200.83059
[943] 220.14703 189.09582 216.45638 187.84823 123.16000 134.74009
[949] 132.19017 143.87020 115.19547 116.98607 155.67385 150.52888
[955] 142.64983 168.55788 199.98930 143.27166 154.94984 119.58612
[961] 158.53816 185.75183 132.25616 227.56495 186.57544 173.40071
[967] 182.43205 141.30477 145.04354 173.11909 172.20692 130.91613
[973] 119.49333 206.55239 99.65380 197.65653 186.00085 198.24160
[979] 178.59657 137.27190 146.81800 189.62369 158.02518 161.97654
[985] 220.97924 177.91163 178.92646 145.45791 153.32269 185.69182
[991] 131.13306 144.04374 170.38156 138.02158 171.74370 178.14670
[997] 121.94114 182.11697 153.82077 152.55914
# simulated height/ weight data
df<-data.frame(height,weight)
m0<-lm(height ~ weight, data = df)
plot(ggeffects::ggpredict(m0, terms = c("weight")), add.data = TRUE, dot.alpha = .8) + labs(x = "simulated weight",
y = "simulated height",
title = "simulated linear relationship")Simulate a non-linear model
a <-160
b <- c(2, 0.75)
x <- rnorm(100)
y <- rnorm(100, mean = b[1] * exp(b[2] * x))
dat1 <- data.frame(x, y)
plot(y ~ x , data = dat1)#Polynomial
N<-1000
# simulate weights
weight <-runif(N, min = 60, max =120)
weight_c <-scale(weight, scale=FALSE)
# simulate coefficients
a = rnorm(N, mean = 180 , 10 )
b1 = rnorm(N, mean = 2.2, .01)
b2 = - rnorm(N, mean = .02, .001)
height <- a + b1 * weight_c + b2 * weight_c^2
# simulated height/ weight data
df1<-data.frame(height,weight_c, weight)
plot(height ~ weight)m1 <- lm(height ~ weight_c, data = df1)
plot(ggeffects::ggpredict(m1, terms = c("weight_c")),
add.data = TRUE,
dot.alpha = .2) + labs(title = "simulated linear relationship") +
xlab("simulated weight") + ylab("simulated height")Non-linear model
m2 <-lm(height ~ weight_c + I(weight_c^2), data = df1)
plot(ggeffects::ggpredict(m2, terms = c("weight_c")),
add.data = TRUE,
dot.alpha = .2) + labs( title = "simulated linear relationship") +
xlab("simulated weight") + ylab("simulated height")m3 <-lm(height ~ bs(weight_c), data = df1)
summary(m3)
Call:
lm(formula = height ~ bs(weight_c), data = df1)
Residuals:
Min 1Q Median 3Q Max
-32.035 -6.296 -0.396 6.564 34.931
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 98.339 1.216 80.87 <2e-16 ***
bs(weight_c)1 68.558 3.544 19.35 <2e-16 ***
bs(weight_c)2 111.865 2.391 46.78 <2e-16 ***
bs(weight_c)3 127.830 2.051 62.31 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10 on 996 degrees of freedom
Multiple R-squared: 0.9333, Adjusted R-squared: 0.9331
F-statistic: 4645 on 3 and 996 DF, p-value: < 2.2e-16
plot(ggeffects::ggpredict(m3, terms = c("weight_c")),
add.data = TRUE,
dot.alpha = .2) + labs( title = "simulated linear relationship") +
xlab("simulated weight") + ylab("simulated height")Check linear model
performance::check_model(m1)Check quadradic model
performance::check_model(m2)Check splines model
performance::check_model(m3)Here we simulate no relationship between a group and an outcome
N <- 200 # number of observations
#group <- rep((0:1), length.out = 200) # 2 groups
group <- rep(c("m","n_m"), each = N/2) #equivalent:
a <- rnorm(N, 150, 3) # intercept
b1 <- rnorm(N, 20, 1) # coefficient of "b
sigma = rexp(N,1)# error term
outcome <- rnorm(N, mean = a + b1 * (group == "m"), sigma)
df <-data.frame(outcome,group)
dplyr::glimpse(df)Rows: 200
Columns: 2
$ outcome <dbl> 169.2349, 163.9593, 172.4746, 170.9098, 171.0012, 17…
$ group <chr> "m", "m", "m", "m", "m", "m", "m", "m", "m", "m", "m…
#model removing the intercept to show the difference
ms<-lm(outcome ~ -1 + group, data = df)
ms
Call:
lm(formula = outcome ~ -1 + group, data = df)
Coefficients:
groupm groupn_m
170.4 150.0
# no difference
sjPlot::plot_model(ms)Is imbalance in my study causing a problem?
### Is imbalance wrecking my inference?
N <- 120
cells <-rep( letters[1:2], times = c(15, 105))
a <- rnorm(N, 2, 1)
b1 <- rnorm(N, .2, .1)
sigma <- rexp(N,1)
out <- rnorm(N, mean = a + b1 * (cells == "b"), sigma)
dfc<-data.frame(out,cells)
sim_cells<-lm(out ~ cells, data = dfc)
summary(sim_cells)
Call:
lm(formula = out ~ cells, data = dfc)
Residuals:
Min 1Q Median 3Q Max
-4.3571 -0.9177 -0.1441 0.8207 8.6009
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.5318 0.4666 3.283 0.00135 **
cellsb 0.8271 0.4988 1.658 0.09996 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.807 on 118 degrees of freedom
Multiple R-squared: 0.02277, Adjusted R-squared: 0.01449
F-statistic: 2.749 on 1 and 118 DF, p-value: 0.09996
This isn’t too convincing: we need to replicate the model many times
# Make a function for the simulation
set.seed(12)
test_fun = function() {
N <- 120
cells <-rep( letters[1:2], times = c(110, 10))
a <- rnorm(N, 2, 1)
b1 <- rnorm(N, 1, .1)
sigma <- rexp(N, 1)
out <- rnorm(N, mean = a + b1 * (cells == "b"), sigma)
dfc <- data.frame(out, cells)
sim_cells <- lm(out ~ cells, data = dfc)
sim_cells
}
sim_lm = replicate(20, test_fun(), simplify = FALSE )
length(sim_lm)[1] 20
We can use the purrr package to generate many replicates of a model
library(dplyr)
tab_sim<-purrr::map_dfr(sim_lm, broom::tidy)
tab_sim %>%
dplyr::mutate_if(is.numeric, round, 5)# A tibble: 40 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.89 0.143 13.2 0
2 cellsb 1.07 0.497 2.15 0.0338
3 (Intercept) 1.88 0.174 10.8 0
4 cellsb 1.74 0.603 2.88 0.00472
5 (Intercept) 1.96 0.140 14.1 0
6 cellsb 2.08 0.484 4.29 0.00004
7 (Intercept) 1.90 0.179 10.6 0
8 cellsb 0.894 0.621 1.44 0.152
9 (Intercept) 1.85 0.192 9.64 0
10 cellsb 0.162 0.664 0.244 0.808
# … with 30 more rows
What percentage of simulations yeild "significant results?
sum(tab_sim$p.value <= .05) / length(tab_sim$p.value)[1] 0.725
Does balance fix the issue?
# Make a function for the simulation
set.seed(12)
test_fun = function() {
N <- 120
cells <-rep( letters[1:2], times = c(60, 60))
a <- rnorm(N, 2, 1)
b1 <- rnorm(N, 1, .1)
sigma <- rexp(N, 1)
out <- rnorm(N, mean = a + b1 * (cells == "b"), sigma)
dfc <- data.frame(out, cells)
sim_cells <- lm(out ~ cells, data = dfc)
sim_cells
}
sim_lm = replicate(20, test_fun(), simplify = FALSE )
length(sim_lm)[1] 20
We can use the purrr package to generate many replicates of a model
library(dplyr)
tab_sim<-purrr::map_dfr(sim_lm, broom::tidy)
tab_sim %>%
dplyr::mutate_if(is.numeric, round, 5)# A tibble: 40 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.87 0.193 9.69 0
2 cellsb 1.06 0.273 3.90 0.00016
3 (Intercept) 2.10 0.237 8.87 0
4 cellsb 0.665 0.335 1.98 0.0495
5 (Intercept) 1.84 0.190 9.66 0
6 cellsb 1.46 0.269 5.43 0
7 (Intercept) 1.71 0.242 7.09 0
8 cellsb 1.34 0.342 3.93 0.000140
9 (Intercept) 1.97 0.259 7.59 0
10 cellsb 0.625 0.366 1.71 0.0902
# … with 30 more rows
sum(tab_sim$p.value <= .05) / length(tab_sim$p.value)[1] 0.975
rnorm functionrnorm(n=20, mean = 0, sd = 1) [1] -0.35670532 -1.24881178 -0.89924439 -0.43286430 -0.38570833
[6] -1.37923867 -1.01461518 0.49397773 -0.76476629 -0.24591544
[11] -0.72134613 -0.35602875 -1.35227019 -0.57471502 -0.39967019
[16] -0.51779937 0.51186624 0.81832323 -0.53839737 -0.03936271
The approach to simulation presented here owes to:
Ariel has a bunch of resources on her website, please check them out.
Later, we’ll prefer a different way of writing regression equations in math. (Note: writing math isn’t math - it’s just encoding the model that we’ve written).↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC-SA 4.0. Source code is available at https://go-bayes.github.io/psych-447/, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Bulbulia (2021, March 23). Psych 447: Samples, paramaters, and elements of a linear model. Retrieved from https://vuw-psych-447.netlify.app/posts/5_1/
BibTeX citation
@misc{bulbulia2021samples,,
author = {Bulbulia, Joseph},
title = {Psych 447: Samples, paramaters, and elements of a linear model},
url = {https://vuw-psych-447.netlify.app/posts/5_1/},
year = {2021}
}